Dimensionality Reduction for Long Duration and Complex Spatio-temporal Queries Technical Report 592 Ghazi Al-naymat and Sanjay Chawla University of Sydney
نویسندگان
چکیده
From tracking of moose in Sweden, to movement of traffic in a large metropolis, spatio-temporal data is continuously being collected and made available in the public domain. This provides an opportunity to mine and query spatio-temporal data with the purpose of finding substantial patterns and understand the underlying data generating process. An important class of queries is based on the flock pattern. A flock is a large subset of objects moving along paths close to each other for a certain pre-defined time. The standard approach to process a “flock query” is to map spatio-temporal data into a high dimensional space and reduce the query into a sequence of standard range queries which can be presented using a spatial indexing structure. However, as it is well known, the performance of spatial indexing structures drastically deteriorates in high dimensional space. In this paper we propose a preprocessing strategy which consists of using a random projection to reduce the dimensionality of the transformed space. We prove an ′′2 − δ′′ probabilistic approximation which results from the projection and present experimental results which show, for the first time, the possibility of breaking the curse of dimensionality in a spatio-temporal setting.
منابع مشابه
School of IT Technical Report DATA PREPARATION FOR MINING COMPLEX PATTERNS IN LARGE SPATIAL DATABASES
The aim of the thesis is to design an efficient algorithm for data preparation in large spatial databases for the purpose of data mining. With respect to finding complex spatial patterns, the raw data needs to be in the form converted into a set of cliques. In our case the raw data was a 1% sample from the Sloane Digital Sky Survey database which contains 818 Gigabytes of astronomical informati...
متن کاملSparseDTW: A Novel Approach to Speed up Dynamic Time Warping
We present a new space-efficient approach, (SparseDTW ), to compute the Dynamic Time Warping (DTW ) distance between two time series that always yields the optimal result. This is in contrast to other known approaches which typically sacrifice optimality to attain space efficiency. The main idea behind our approach is to dynamically exploit the existence of similarity and/or correlation between...
متن کاملSchool of IT Technical Report PARAMETER-FREE CLASSIFICATION FOR IMBALANCED DATA SCORING USING COMPLEMENT CLASS SUPPORT TECHNICAL REPORT 597 BAVANI ARUNASALAM AND SANJAY CHAWLA THE UNIVERSITY OF SYDNEY
In this paper we propose a score metric to faciliate classification in data sets which have an imbalanced class distribution. The score metric is based on the rules generated using an “Associative Classifier” except that instead of using support we use the Complement Class Support (CCS) measure that we have recently proposed. The advantage of CCS is that only positively correlated rules are gen...
متن کاملMV3R-Tree: A Spatio-Temporal Access Method for Timestamp and Interval Queries
Among the various types of spatio-temporal queries, the most common ones involve window queries in time. In particular, timestamp (or timeslice) queries retrieve all objects that intersect a window at a specific timestamp. Interval queries include multiple (usually consecutive) timestamps. Although several indexes have been developed for either type, currently there does not exist a structure t...
متن کاملNew Methods for Mining Sequential and Time Series Data
Data mining is the process of extracting knowledge from large amounts of data. It covers a variety of techniques aimed at discovering diverse types of patterns on the basis of the requirements of the domain. These techniques include association rules mining, classification, cluster analysis and outlier detection. The availability of applications that produce massive amounts of spatial, spatio-t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006